Question 1

Visualization Title& Link:

Title: “X-Men 97 Has been a favorite of Both Fans & Critics”

Source: https://www.reddit.com/r/dataisbeautiful/comments/1cel54l/oc_how_does_xmen_97_stack_up_against_past_xmen/

Image:

Optional Caption
Optional Caption
A. Describing the context

Who is the audience or audiences?:
As of visualization title says, the targeted audiences are both Fans and Critics of X-Men.Moreover, It is useful for the people who interested in comics and superhero movies.

What is the action the visualization is aiming for? Consider each audience here:

  • To agree that X-men 97 is the most favorite X-Men series for both Fans and critics.

  • To know about X-Men movies and shows rotten tomatoes scores.

When can the communication happen, and what tools have been used to suggest an order:

  • All information is given in a static image.

  • The communication start from the picture of the comic book character in the visualization.It clearly shows the information is about X-men.Then, it not follows a clear path.The author use different visual elements to attract like good title gives clear information and logos of different movies to make it attractive.

How has the data been used to convey the action?:
A scatter plot is used here to convey the action. Each point in the plot represents a different X-Men movie or series. The x-axis gives the rotten tomato score of the critics and y-axis represent the audience’s score. The movie/show logo is used to represent each dot, which makes visualization easier to understand.

B. Genre

Which of the seven genres listed above best describes the data visualization?

Annotated Chart.

C. Author-driven vs Reader-driven

Where on the spectrum from author- to reader-driven is this visualization?

Reader - driven as the information can be read in any order.

Question 2

The reading used for this is Davis et al 2015

a)What is the main argument of the paper? The main argument of this paper is that science communicators treat visual material as an optional add on part instead of being an integrated part of the communication and they fail to identify the target audience and they don’t add visual element for them.

b)According to this paper, why is effective visual communication important (or not)? According to this paper, we live in atmosphere soaked with visual elements, so visual language is potentially powerful tool to communicate in this 21st century.It hepls to improve the explanations and understanding the scientific matters.

c)What are the key elements, considerations, or factors to be considered for effective visual communication addressed in the paper? Do you disagree with any?

  1. Identify the audience The science communicators need to consider to whom they are communicating. By understanding the audience, they can visual elements according to the knowledge of them like more detailed explanations. For example, when the audience is general public, the science visualization need to include more details.

2.User centered Design for Science communication When designing a science visualization, the author the audience is the primary focus.

3.Using tools from graphic design to enhance graphic design. Implementing styles and techniques like symbols,abstract images,symbolic notations and so on. It will gain attention of readers and show only particular message in the graph

I agree with all of these factors outlined in the paper.

d)What pitfalls are identified in the paper that can be avoided if we use effective visual communication? * Lack of identifying audiences * Considering visual elements as an optional add on part of communication rather than a integrated part. * Lack of knowledge in visual literacy and design principles

Question 3

head(trees)
#Code for scatterplot
ggplot(trees, aes(x= Height,y=Girth)) +
  geom_point()+geom_smooth(method=lm)+
  labs(title="Variation in tree Heigths with Grith",x = "Tree Height",y="Tree Girth",caption = "Tree Girth =Cirucumference of tree trunk")+
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

The data set used for this visualization is Trees.This contains information about tree height. The plot type is scatter plot(helpful to visualize continuous data),it is useful to show variation of tree heights and it’s relationship to girth size.The variables used are girth on the y-axis and height on the x-axis,this allows us to compare tree height between different girth size.The title will help the understand the story of the visualization.In this plot,minimal theme has been used to increase the visual quality(reduced the data - ink ratio).Additional information is given about variable girth.It will helpful to reader to interpret the data.

Question 4

apple_subset <- apple_mobility %>% 
      filter(country == c('Australia','India','United Kingdom','Italy'),transportation_type == 'driving')%>%drop_na()
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `country == c("Australia", "India", "United Kingdom", "Italy")`.
## Caused by warning in `country == c("Australia", "India", "United Kingdom", "Italy")`:
## ! longer object length is not a multiple of shorter object length
#Converting the date to the nearest year
Year<-apple_subset$date%>%format(format='%Y')
#Plotting grouped bar plot
ggplot(apple_subset, aes(x = Year, y = score, fill = country)) +
  geom_bar(position = "dodge",stat = "identity") +
  labs(
    title = "Driving Mobility Trends for Selected Countries",
    x = "Year",
    y = "score",
    fill = "Country"
  ) +
  theme_minimal()

The apple mobility data is extremely large so line plot and area plot will not be able show the trend clearly(overlaps).Therefore, I used grouped bar plot is an effective way plot 2D Categorical data(X) and continuous data(Y). The variables used in this plot year and score of driving mobility, it allows us compare the difference in driving mobility in four different countries.I used a minimal theme to improve visual quality of my visualization and used color for representing each countries.Moreover, converting the dates to year,helped to reduce the clutter and improve the visual quality.

Question 5

Visualization 1

#The data set used in these is UCBAdmissions
#Converting it into a data frame
data<-as.data.frame(UCBAdmissions)
head(data)
#subset of the data
subset<-data%>%filter(Admit == "Admitted")

#Bar plot
ggplot(subset,aes(y=Freq,x=Dept,fill=Gender))+
  geom_bar(stat="Identity",position="dodge")+#Creating the bar plot
  geom_text(aes(y=Freq,x=Dept,label=Freq), position = position_dodge(width = 0.9), vjust = -0.5)+#Adding the number of admitted students into bar plots
  theme_classic()+ 
  scale_y_continuous(breaks=NULL)+scale_fill_manual(values=c("black","pink"))+
  labs(x="Departments",y="Frequency",title = "Admitted students by gender and department in UCB")

Plot type:- Grouped Bar Plot. It is helpful to compare the different groups(gender).The data set contains categorical and continuous variable,so using of bar plots will more effective. The variables I choose for this visualization is frequency, department and Gender.It helps to show the amount of admitted in each department according to their gender.

*Aesthetic choices:-In this visualization, I used to different color for two different gender, so the reader can easily interpret them.There is no grid lines and unwanted use of color, so it helped to reduce the data to ink ratio.The precise amount of admitted students is given in top of each bar, so it will be helpful for the reader. An appropraite title is given to plot to understand about the model in a glance.

Visualisation 2

data<-as.data.frame(UCBAdmissions)
Box_plot<-ggplot(data,aes(x=Gender,y=Freq,fill=Admit))+
  geom_boxplot()+
  labs(x="Gender",y="Frequency",title = "Acceptance and Rejection Rate By Gender")+ #Adding labels
  theme(panel.background=element_rect(fill="white"))+theme_classic()
#changing the default color of box plot
Box_plot+scale_fill_manual(values=c("Admitted" = "green", "Rejected" = "red"))

The plot used in this visualization is Box Plot.It effectively visualize the acceptance and rejection rate by gender.The box plot is is an appropriate choice for comparing distributions and identifying any differences in acceptance and rejection rate between genders.The x axis represents the gender and y axis represent the frequency. The audience can notice can significant difference acceptance rate by gender.Instead of using the default color, I used green for acceptance and red for rejection.This makes it easier intrepret the data at a glance.

Question 6

#Our data
gender_pay_gap <- read.csv(
  "https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")
gender_pay_gap
#creating the first bubble plot
ggplot(gender_pay_gap,aes(x=Men,y=Women,size =Gap))+
  geom_point(alpha =0.7) + geom_text_repel(aes(label = School),nudge_x = 1,nudge_y = 2,hjust=0.5,vjust=0.5)+labs(x="Men's pay",y="Women's Pay",title ="Comparing Gender Pay gaps by University")+
  theme_classic()

#Second Bubble plot
#storing the highest and lowest pay gaps in a variable
max_gap<-50
min_gap<-15
#altering the dataset
gender_pay_gap$GapLevel<- with(gender_pay_gap, 
  ifelse(Gap <= min_gap, "Low Gap", 
    ifelse(Gap >= max_gap, "High Gap", "Medium Gap")))
#code for the bubble plot
ggplot(gender_pay_gap,aes(x=Men,y=Women,colour = GapLevel))+
  geom_point(alpha =0.7) + geom_text(aes(label = School),nudge_x = 2,nudge_y = 2,hjust=0.5,vjust=0.5)+labs(x="Men's pay",y="Women's Pay",title ="Comparing Gender Pay gaps by University")+
#then,we add some visual elements to increase effectiveness
  scale_color_manual(values = c("Low Gap" = "blue", "Medium Gap" = "grey90", "High Gap" = "red"))+
  theme_classic()

Part 1 The first observation reader makes in the visualization is the different colors given to Gap level because color is a strong preattentive attribute.The reader easily understand which universities has highest and lowest gender pay gap in a glance because of the colour.

Part 2 In the first plot,the gap has represented using the preattentive attribute of size to show the gap. It helps to show the quantitative information accurately.

Part 3 First plot, it can used by the people who studies about universities,policy makers and Companies. Second plot, the targeted audience for this visualization is general public. It mainly shows which universities has highest and lowest gender pay gaps

Question 7

#Checking the dataset
head(CO2)
#creating a new dataset
new_data<-CO2%>%mutate(names = Plant)
head(new_data)
#creating the plot
ggplot(new_data, aes(x = conc, y = uptake, color = Type)) +
    geom_line(data = new_data %>% dplyr::select(-Plant), aes(group = names), color = "grey", size = 0.5, alpha = 0.5) +
    geom_line() +
  #some visual element to improve the effectiveness
    scale_color_manual(values = c("Quebec" = "blue", "Mississippi" = "red"), 
                       labels = c("Quebec", "Mississippi"),
                       breaks = unique(new_data$Type)) +
    theme(
        plot.title = element_text(size = 14),
        panel.grid = element_blank()
    ) +
    labs(
        title = "Spaghetti Plot of CO2 Uptake vs. Concentration by Plant",
        x = "Concentration",
        y = "CO2 Uptake",
        color = "Plant Type", caption = "The 'N' and 'C' in the plant names refers to 'Nonchilled' and 'Chilled' treatments respectively."
    ) +
    facet_wrap(~ Plant)+theme_classic()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Part 1:- The first thing the reader notices from this visualization is the difference in the CO2 intake of two different plant types because of their different colors. Moreover, the reader can see a trend that, if the concentration increases,the CO2 intake also increases.

Part 2:- Faceting is effective way to show this data.It helps to reduce the overlaying and make everything clearer.By arranging the plots side by side,It helps the reader to compare all the different combinations simultaneously.

Part 3:- The another visualization option is a interactive line chart. This will helpful to store all the data in one plot. For example, by selecting each line in the graph we can see what type treatment, type of plant and precise amount of concentration and CO2 intake.

Question 8

# the dataset using in this visualiasation
head(tuesdata$forest_area)
#creating a subset of this data
forest_data<-tuesdata$forest_area%>%filter(entity%in%c("India","Australia","United Kingdom","Spain","Ireland","Yemen","Switzerland","Sweden","Sri Lanka","New Zealand"),year%in%c( 2007,2008,2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020))

# Calculate the percentage change
forest_data <- forest_data %>%
  arrange(code, year) %>%
  group_by(code) %>%
  mutate(Percentage_Change = (forest_area - lag(forest_area)) / lag(forest_area) * 100)
#changing the NA values to 0
forest_data[is.na(forest_data)] <- 0
#creating a heat map for this subset data
ggplot(forest_data,aes(x=year, y = entity))+
  geom_tile(aes(fill = Percentage_Change))+ 
  scale_fill_gradientn(colors = brewer.pal(9, "Greens"),name="Forest Area")+
  theme(panel.background = element_rect(fill="white"))+
  labs(x="year",y="country",title = "Changes in Forest Coverage Across Selected Countries Over a Decade")+
  theme(plot.title = element_text(hjust=0.5))

Part 1:-The first observation reader makes from this visualization is the forest area in different countries because of the sequential color hue. The plot immediately shows the forest area changes of country like Australia because dark green color.

Part 2:- Audience can easily interpret the data and comparisons between country.However, reader can’t extract real data points from visualizations.The changes in forest area is very small throughout the decade,so we can’t able to see the trend clearly.

Part 3:- The another visualization option is time series line plot.It is the most common way to visualize time series. We can clearly see the progression.We can facet it into the different chart according to the country. It enables the comparison more easier.We can see the small changes across the year very clearly.

Question 9

#the dataset used in this visualization
head(tuesdata$soybean_use)
# Cleaning the data
tuesdata$soybean_use
selected_countries<-tuesdata$soybean_use %>%filter(entity%in%c("Australia ","Asia","Europe","Africa","Northern America","South America"))
soybean_summary <- selected_countries %>%
  dplyr::select(entity, where(is.numeric)) %>% # selecting the entity and other numeric values
  dplyr::select(-year) %>% # removing the column year from the dataset
  group_by(entity) %>% # grouping by entity
  summarise(across(human_food:processed, median)) %>% # finding the median values different type of usage 
  rename(country = entity) %>%#renaming it 
  mutate(across(-country, rescale))#rescaling it 0 to 1
soybean_summary
ggRadar(soybean_summary, aes(x = c(human_food,animal_feed,processed), 
                group = country),
            size = 1)  +facet_wrap(~country)+scale_y_discrete(breaks=NULL)+ggtitle("Soybean usage in different Continents")+theme(legend.position = "none")

Part 1:- The first observation reader makes from this radar is different polygons in the plot.The preattentive attribute used in radar plot is shape and area of the polyogon,so readers primary attention goes into that.

Part 2:- Radar plot is one of the good choice to choice to visualise this data. It is good for comparison. It prompt the audiences to remember the message and it is accessible to general audiences.

Part 3:- Another choice of visualization is grouped bar chart.We can effectively visualize the all the variable in that. For example, each bars shows the different usage of soybean,then grouped bars represent the continent. Moreover, we can add year into that by faceting.

Question 10

Part 1

data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
head(data)
# short names
colnames(data) <- c("Africa", "EAsia", "Europe", "LatinAm.",   "NorthAm.",   "Oceania", "SAsia", "SEAsia", "Sov.Un.", "WAsia")
rownames(data) <- colnames(data)

# The data is in the form of an adjacency matrix, but we need it in a three column matrix instead: first column origin, second column destination, third column flow. We will use simple re-shaping commands for this:
data_long <- gather(rownames_to_column(data), key='To', value='value', -rowname)
#creating an alluvial plot
ggplot(data_long,aes(y=value,axis1=rowname,axis2=To))+
         geom_alluvium(aes(fill=rowname),width=1/12)+
         geom_stratum(width = 1/12)+#arranging the width of the stratum(axes)
  geom_text(stat = "stratum",aes(label=after_stat(stratum)),size=3)+#labeling each axes
         scale_x_discrete(limits=c("From","To"),expand = c(.05,.05))+
  scale_fill_brewer(type = "qual",palette = "Set3")+
  theme_minimal()+
  labs(fill="Countries",y="",title = "Migration in different countries")+
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),panel.grid = element_blank(),plot.title = element_text(hjust = 0.5))
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.

Part 2:-The audience of the the plot people who interested in migrating to other countries,policy makers and government.It tries to understand the migration within and outside the country also it helps the government to try to reduce the migration outside the country.It is static, all every information is revealed at once.

Part 3:- This plot used some of the preattentive attributes to improve the visual communication. The title helps reader to understand about visualization.Different color is used to represent the each country and thickness of the flow represent the line amount of migration. By adding the label From and To audience understand the direction.

Part 4:-In alluvial plot we can only understand the flow and we can’t interpret numerical values, so stacked bar plot to represent each country and each segments refers to migrated country. We use can colors to distinguish each in the bar. It clearly represent the data including all the numerical values, making it easier for the viewers understand the more clearly than a alluvial plot.

Part 4:-An interactive radar plot will be a good option,the each point in the radar chart represent different country people were migrated and by faceting it we can compare migration all countries. By clicking on each point, we can see amount of people migrated,reason for migrating to other country and reason for migrating to that particular country. This visual is more useful and contains more information that simple alluvial plot.

Example:- https://flourish.studio/blog/create-online-radar-spider-charts/

Question 11

attr1 <- read.csv("data/Farms/attr_farms.csv", stringsAsFactors = F) #load the edgelist_farms.Rdata
edges <- read.csv("data/Farms/Edgelist_farms.csv", stringsAsFactors = F) #load attr.farms.Rdata
#creating an igraph object
head(edges)
net_edges<- graph.data.frame(edges,directed=T)
## Warning: `graph.data.frame()` was deprecated in igraph 2.0.0.
## ℹ Please use `graph_from_data_frame()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
head(net_edges)
## 6 x 120 sparse Matrix of class "dgCMatrix"
##   [[ suppressing 120 column names '1', '3', '2' ... ]]
##                                                                               
## 1  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 3  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 2  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 5  . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 15 . . . . . . . 3 . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . .
## 13 . . . . . . . . . 1 . . . . 2 1 . 1 . . . . . . . . . . . . . . . . . . . .
##                                                                               
## 1  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 3  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . .
## 2  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 5  . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . .
## 15 . . . . . . . . . . . . . . . . . . . . 2 . . . . . . . . . . . . . 2 . . .
## 13 . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . 1 . . .
##                                                                               
## 1  . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 3  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 2  . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 5  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 15 . . . 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## 13 . . 1 . . . . . 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##               
## 1  . . . . . .
## 3  . . . . . .
## 2  . . . . . .
## 5  . . . . . .
## 15 . . . . . .
## 13 . . . . . .
#we add herd size to each vertex
V(net_edges)$farm.size <- attr1$size[match(V(net_edges)$name,attr1$farm.id)]
#we add differnent types to each vertex
V(net_edges)$type <- as.character(attr1$type[match(V(net_edges)$name,attr1$farm.id)])
deg <- igraph::degree(net_edges, mode="total")
V(net_edges)$size<-deg*3
E(net_edges)$arrow.size<-0.02
V(net_edges)$label<-NA

#Creating a colour platette for representing the farm size(continous variable)
color_palette <- colorRampPalette(c("white", "red"))
num_colors<- 100
color_values <- color_palette(num_colors)
min_size <- min(V(net_edges)$farm.size)
max_size <- max(V(net_edges)$farm.size)
color_index <- findInterval(V(net_edges)$farm.size, seq(min_size, max_size, length.out = num_colors))
V(net_edges)$color <- color_values[color_index]
# Adjust plot margins
par(mar=c(1,1,1,1))  

# Plot the network
plot.igraph(
  net_edges,
  layout = layout.fruchterman.reingold,
  vertex.size=8,
  vertex.color = V(net_edges)$color,  # Color vertices based on farm size
  edge.arrow.size = 0.3,  # Set arrow size for edges
  edge.color = "black",
  edng.width=E(net_edges)$weight/6,
  main = "Cattle Network Visualization based on size"  # Add main title
)
# Add legend for farm size
legend("right", legend = c("Low Farm Size", "High Farm Size"),
       fill = color_palette(2), bty = "n", title = "Farm Size")

Part 2:-Most of the farms with larger herd size has located in between the smaller farms(They are more centrally located). We can see a cluster around the big farms.

Part 3:-This visualization mainly focuses people interested in cattle farming/cattle farmers and agricultural policy makers.From this visualization,we can see that smaller is connected to larger farms, also there is not much connection between smaller farms.

Part 4:- In this visualization, i used color to represent the size of the farms.I used a sequential color hue because it is suitable option to represent qualitative data.There is an appropriate title for this visualization.

Part 5:- Another visualization option is an heat map. We can plot farms in the x and y axis.The intensity of the each block can represent the strength of the relationship and we can also use a color hue to represent the size of the farm.The heat map would be a better option because we can reduce the clutter and able to make comparisons, so it will more useful for the reader.

Question 12

world <- ne_countries(scale = "medium", returnclass = "sf")
class(world)
## [1] "sf"         "data.frame"
# Selecting the map of Asia 
Asian_map<-world%>%filter(continent == "Asia")
#Plotting the map
ggplot(data = Asian_map) +
  geom_sf(aes(fill = income_grp)) +
  ggspatial::annotation_scale(location = "bl", width_hint = 0.3) + 
  ggspatial::annotation_north_arrow(
    location = "bl", 
    which_north = "true", 
    pad_x = unit(0.4, "in"), 
    pad_y = unit(0.5, "in"))+ggtitle("Asia's Economic Diversity: Visualizing Income Groups")+
  theme(panel.grid.major = element_line(color = gray(.5), size = 0.5), 
        panel.background = element_rect(fill = "aliceblue"))+labs(fill = "Different Income groups")
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Scale on map varies by more than 10%, scale bar may be inaccurate

# Creating the bar plot
ggplot(data = Asian_map, aes(x = name, y = gdp_md,fill=income_grp)) +
  geom_bar(stat = "identity") +
  scale_y_log10() +#scaling the axis
  theme(axis.text.x = element_text(angle = 45, hjust = 1),legend.title = element_text("Different income group")) +
  labs(x ="Country",y="GDP",title = "Average GDP of different countries in Asia")+coord_flip()+#flipping the axis 
  theme_bw()

Visualization 1(The map):- The intended audience of this is policymakers and general public who interested global economy ,it shows income levels of different countries in asian continent.

Visualization 2(The bar chart):- Policy makers,economists and general public who interested global economy,it compares the gdp of different countries in Asia according to their income level. These two visualizations provide good information together because income group determined based on gdp and some factors of a country.In these visualizations,people can compare different according to their gdp and income level.For example, some countries with high gdp is placed as upper middle income and lower middle income, so economists can do further experiment about that.

Reference:-https://www.un.org/en/development/desa/policy/wesp/wesp_current/2014wesp_country_classification.pdf

Question 13

Part 1:-

# loading the datasets
tas_locations <- read.csv("whiskey_data/tas_locations.csv")
whiskey_sales <- read.csv("whiskey_data/whiskey_sales_tasmania.csv")
#Manipulating the data 
map_data <- tas_locations %>%
  left_join(whiskey_sales %>% count(producer, name = "producer_count"), by = c("place" = "producer")) %>%
  left_join(whiskey_sales %>% count(consumer, name = "consumer_count"), by = c("place" = "consumer"))
drop_na(map_data)
#converting the data to sf format 
tasmania_map <- st_as_sf(map_data, coords = c("lon", "lat"), 
    crs = 4326, agr = "identity")%>%
  mutate(lon = st_coordinates(.)[, 1],
         lat = st_coordinates(.)[, 2])
#creating the given map 
ggplot(data = world) +
  geom_sf(fill = "gray85",color="gray70") + 
  geom_sf(data = tasmania_map, aes(size = consumer_count, color = "consumer"),  alpha = 0.7) +#adding the consumer count to the map
  geom_sf(data = tasmania_map, aes(size = producer_count,color="producer"), alpha = 0.7)+
  geom_text_repel(data=tasmania_map ,aes(x=lon,y=lat, label=place), fontface = "bold", check_overlap = FALSE)+
   scale_color_manual(values = c("consumer" = "skyblue", "producer" = "pink"),
                     labels = c("consumer" = "Consumer", "producer" = "Producer ")) +
  coord_sf(xlim = c(143.5, 153), ylim = c(-43.7, -37.7), expand = FALSE) +
  labs(x = "lon", y = "lat",
       size ="n",color="type") +
  theme(panel.background = element_rect(fill = 'grey95'),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),axis.text.x = element_blank(),axis.text.y = element_blank(),axis.ticks = element_blank())#removing the grids and all labels form the map
## Warning in geom_text_repel(data = tasmania_map, aes(x = lon, y = lat, label =
## place), : Ignoring unknown parameters: `check_overlap`
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_sf()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_sf()`).
## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Part 2

#creating the given map 
ggplot(data = world) +
  geom_sf(fill = "gray85",color="gray70") + 
  geom_sf(data = tasmania_map, aes(size = consumer_count, color = "consumer"),  alpha = 0.7) +#adding the consumer count to the map
  geom_sf(data = tasmania_map, aes(size = producer_count,color="producer"), alpha = 0.7)+
  geom_text_repel(data=tasmania_map ,aes(x=lon,y=lat, label=place), fontface = "bold", check_overlap = FALSE)+
   scale_color_manual(values = c("consumer" = "skyblue", "producer" = "pink"),
                     labels = c("consumer" = "Consumer", "producer" = "Producer ")) +
  coord_sf(xlim = c(143.5, 153), ylim = c(-43.7, -37.7), expand = FALSE) +
  labs(x = "lon", y = "lat",
       size ="n",color="type") +
  theme(panel.background = element_rect(fill = 'grey95'),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),axis.text.x = element_blank(),axis.text.y = element_blank(),axis.ticks = element_blank())#removing the grids and all labels form the map
## Warning in geom_text_repel(data = tasmania_map, aes(x = lon, y = lat, label =
## place), : Ignoring unknown parameters: `check_overlap`
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_sf()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_sf()`).
## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

#Improved visualisation
ggplot(data = world) +
  geom_sf(color = "black") + 
  geom_sf(data = tasmania_map, aes(size = consumer_count, color = "consumer"), alpha = 0.7) +
  geom_sf(data = tasmania_map, aes(size = producer_count, color = "producer"), alpha = 0.7) +
  geom_text_repel(data = tasmania_map, aes(x = lon, y = lat, label = place), fontface = "bold", check_overlap = FALSE) +
     ggspatial::annotation_scale(location = "br", width_hint = 0.3) +
     ggspatial::annotation_north_arrow(location = "br", which_north = "true", 
        pad_x = unit(0.3, "in"), pad_y = unit(0.3, "in"),style = north_arrow_nautical())+
  scale_color_manual(values = c("consumer" = "blue", "producer" = "pink"),
                     labels = c("consumer" = "Consumer", "producer" = "Producer ")) +
  coord_sf(xlim = c(143.5, 153), ylim = c(-43.7, -37.7), expand = FALSE) +
  labs(x = "Longitude ", y = "Latitude", size = "Count", color = "type",title = "Whiskey Producers and Consumers in Tasmania") +
  annotate(geom = "text", x = 146, y = -40, label = "Bass strait", fontface = "italic", color = "grey22", size = 6, ) +theme_minimal()+
  theme(panel.background = element_rect(fill = "aliceblue"),plot.title = element_text(face = "bold"))
## Warning in geom_text_repel(data = tasmania_map, aes(x = lon, y = lat, label =
## place), : Ignoring unknown parameters: `check_overlap`
## Warning: Removed 7 rows containing missing values or values outside the scale range
## (`geom_sf()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_sf()`).
## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

The audiences of this visualization are stakeholders in whiskey industry in Tasmania, researchers and anyone who interested in producers and consumers of whiskey in Tasmania.

The message of this visualizations are same. It show the locations of major whiskey producers and consumer in Tasmania, Australia.In this visualisations, size is used to represent the count of whiskey producer/consumer(continous variable) and color is used to represent producer or consumer(discrete value).Moreover, a north arrow is given to identify the direction.Proper title is given to identify what this visualization is about.Some annotation is also given to in the map(brass strait.it is a division between Australia mainland and Tasmania island).

Part 3

#manipulate the data for making a lollipop plot.
lollipop_data <- map_data %>% pivot_longer(cols = c(producer_count, consumer_count), names_to = "type", values_to = "count")
#removing all NA from the data
drop_na(lollipop_data)
# Create lollipop plot
ggplot(lollipop_data, aes(x = place, y = count, color = type)) +
  geom_segment(aes(x = place, xend = place, y = 0, yend = count)) +
  geom_point(size = 4, alpha = 0.7) +
  labs(title = "Whiskey Producer and Consumer in Tasmania",
       x = "City", y = "Count",
       color = "Type") +coord_flip()+
  theme_minimal() 
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_segment()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).

The lollipop gives more accurate information about whiskey production and consumption in Tasmania.This mainly focuses on Whiskey production company owners and stakeholders.They can start new production units in different places according to the data like places with high consumers and less producers.

Question 14

#loading the dataset of each plant
invasive_plant_1 <- raster("invasive plant/sumrast_allassumptions.avg_Alliaria petiolata.tif")
invasive_plant_2 <- raster("invasive plant/sumrast_allassumptions.avg_Cirsium arvense.tif")

# Define breaks 
brk <- c(0,0.2, 0.4, 0.6, 0.8, 1) # numeric breaks for plants

# Plot for Alliaria petiolata
plot(invasive_plant_1, breaks = brk,col= terrain.colors(6),
     main = "Distribution of Alliaria petiolata", 
     legend.args = list(text = 'likelihood of finding species', side = 4, font = 2, line = 2.5, cex = 0.8),
     xlab = "Longitude", ylab = "Latitude")

# Plot for Cirsium arvense
plot(invasive_plant_2, breaks = brk,col=terrain.colors(6),
     main = "Distribution of Cirsium arvense", 
     legend.args = list(text = "likelihood of finding species", side = 4, font = 2, line = 2.5, cex = 0.8),
     xlab = "Longitude", ylab = "Latitude")

The numeric breaks helps readers to see the distribution clearly.Then i used color to represent the distribution of the particular plant,it helps the reader to see the hot-spots with higher likelihood of finding the particular plant.A proper title is also given to the plot to improve the message of the visualisation.

Question 15

#converting the raster to a data frame 
plant_1<- as.data.frame(invasive_plant_1)
#converting the raster to a data frame
plant_2<- as.data.frame(invasive_plant_2)
#combining the two datasets
combined_data<-cbind(plant_1,plant_2)
#changing the name of the column
combined_data <- combined_data %>%
  rename(Alliaria_petiolata = sumrast_allassumptions.avg_Alliaria.petiolata,
         Cirsium_arvense = sumrast_allassumptions.avg_Cirsium.arvense)
#removing all the NA values from the dataset 
combined_data<-drop_na(combined_data)
#converting it into a longer format 
boxplot_data <- pivot_longer(combined_data,cols = everything(),  # Pivot all columns
                          names_to = "Species",  # New column name for species
                          values_to = "Value")

#creating a boxplot
ggplot(boxplot_data,aes(x=Species,y = Value,fill = Species))+
  geom_boxplot(outliers = FALSE)+labs(title = "Distribution of two invasive species",y = "Likelihood of finding the species")+theme_bw()

For the recommendations,the bar plot shows the average likelihood of finding the Crisium arvense is very much higher than the Alliaria petiolata. Also,the map shows that distribution of Cirisium arvense is more widely spread all over of Minnesota than the Alliaria petiolata.So, my recommendation for Minnesota Department of Natural Resources is implement measures to control the spread of Cirsium arvense.As per data, it is more invasive than Alliaria petiolata.